Speech segmentation with a neural encoder model of working memory

نویسندگان

  • Micha Elsner
  • Cory Shain
چکیده

We present the first unsupervised LSTM speech segmenter as a cognitive model of the acquisition of words from unsegmented input. Cognitive biases toward phonological and syntactic predictability in speech are rooted in the limitations of human memory (Baddeley et al., 1998); compressed representations are easier to acquire and retain in memory. To model the biases introduced by these memory limitations, our system uses an LSTMbased encoder-decoder with a small number of hidden units, then searches for a segmentation that minimizes autoencoding loss. Linguistically meaningful segments (e.g. words) should share regular patterns of features that facilitate decoder performance in comparison to random segmentations, and we show that our learner discovers these patterns when trained on either phoneme sequences or raw acoustics. To our knowledge, ours is the first fully unsupervised system to be able to segment both symbolic and acoustic representations of speech.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Relationship between Working Memory, Auditory Perception and Speech Intelligibility in Cochlear Implanted Children of Elementary School

Objectives: This study examined the relationship between working and short-term memory performance, and their effects on cochlear implant outcomes (speech perception and speech production) in cochlear implanted children aged 7-13 years. The study also compared the memory performance of cochlear implanted children with their normal hearing peers. Methods: Thirty-one cochlear impl...

متن کامل

TricorNet: A Hybrid Temporal Convolutional and Recurrent Network for Video Action Segmentation

Action segmentation as a milestone towards building automatic systems to understand untrimmed videos has received considerable attention in the recent years. It is typically being modeled as a sequence labeling problem but contains intrinsic and sufficient differences than text parsing or speech processing. In this paper, we introduce a novel hybrid temporal convolutional and recurrent network ...

متن کامل

بررسی کنش‌های شناختی دانش‌آموزان دارای لکنت

Objective Stuttering is one of the most common speech disorders that generate many complications in children and adults. This disorder involves behavioral, cognitive and emotional interactions. So, the purpose of the current study is to investigate the cognitive functions of students with stuttering. Materials & Methods A descriptive study, comprising of 30 students (8 females and 22 males) fr...

متن کامل

Multitask Learning with CTC and Segmental CRF for Speech Recognition

Segmental conditional random fields (SCRFs) and connectionist temporal classification (CTC) are two sequence labeling methods used for end-to-end training of speech recognition models. Both models define a transcription probability by marginalizing decisions about latent segmentation alternatives to derive a sequence probability: the former uses a globally normalized joint model of segment labe...

متن کامل

The Diagnosis of Brucellosis in Rafsanjan City Using Deep Auto-Encoder Neural Networks

Introduction: Brucellosis is considered as one of the most important common infectious diseases between humans and animals. Considering the endemic nature of brucellosis and the existence of numerous reports of human and animal cases of brucellosis in Iran, the incidence of human brucellosis in Rafsanjan city was determined in the last 3 years (2016–2018). The main objective of this study was t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017